Grow Data Skills

Tiger Analytics | Data Engineer Interview Experience | 4.8+ YoE

Priyesh Gupta
15-Dec-2024
5 mins read

Round 1: Technical

✅ Tell me about your work experience and any recent projects you were part of.

✅ Questions related to client and cluster deployment.

✅ Explain the spark-submit command and the configurations you applied in your project.

✅ How can you optimize Spark code for better performance?

✅ How do you handle data skewness in Spark?

✅ How would you handle a situation where the number of columns in your data source keeps on increasing or decreasing?

✅ Write Spark code to process such data, assuming the source files are in JSON format and stored in Azure Data Lake Storage (ADLS).

✅ 2-3 scenario-based questions related to Azure Data Factory (ADF).

✅ How would you implement Slowly Changing Dimension (SCD) Type 2 in your organization?

✅ One scenario was given where you had to identify:

✅ The type of schema used.

✅ The tables and relationships between them.

✅ The constraints applied to those tables.

✅ What is one key difference between RDDs, DataFrames, and Datasets in Spark?

✅ What is the difference between data lake storage, data warehouses, and Delta Lake?

✅ What is the difference between the Temp view tables and temp tables, and where are they used?

✅ Explain SQL triggers and how SQL execution works.

✅ Write Spark code to create a DataFrame from a CSV file where the delimiter is | instead of a comma.

✅ Write Spark code to find the highest salary for each department using the following dataset:

data = [

[1001, 'Marlania', 92643, 1],

[1002, 'Briana', 87202, 1],

[1003, 'Maysha', 70545, 1],

[1004, 'Jamacia', 65285, 1],

[1005, 'Kimberli', 51407, 2],

[1006, 'Lakken', 88933, 2],

[1007, 'Micaila', 82145, 2],

[1008, 'Gion', 66187, 2],

[1009, 'Latoynia', 55729, 3],

[1010, 'Shaquria', 52111, 3],

[1011, 'Tarvares', 82979, 3],

[1012, 'Gabriella', 74132, 4],

[1013, 'Medusa', 72551, 4],

[1014, 'Kubra', 55170, 4]

]

columns = ['emp_id', 'emp_name', 'salary', 'emp_dep_id']

✅ Write Spark code to calculate the average price for products based on the following tables:

Prices Table:

+------------+------------+------------+--------+

| 1 | 2019-02-17 | 2019-02-28 | 5 |

| 1 | 2019-03-01 | 2019-03-22 | 20 |

| 2 | 2019-02-01 | 2019-02-20 | 15 |

| 2 | 2019-02-21 | 2019-03-31 | 30 |

+------------+------------+------------+--------+

Units Sold Table:

+------------+---------------+-------+

| product_id | purchase_date | units |

+------------+---------------+-------+

| 1 | 2019-02-25 | 100 |

| 1 | 2019-03-01 | 15 |

| 2 | 2019-02-10 | 200 |

| 2 | 2019-03-22 | 30 |

+------------+---------------+-------+

Expected Output:

+------------+---------------+

| product_id | average_price |

+------------+---------------+

| 1 | 6.96 |

| 2 | 16.96 |

+------------+---------------+

✅ Write Python code to merge two strings alternately. For example:

Input:

word1 = "abc"

word2 = "pqr"

Output:

"apbqcr"

Input:

word1 = "ab"

word2 = "pqrs"

Output:

"Apbqrs"

Round 2: Techno Managerial

✅ What are your skillsets, roles, and responsibilities in your current project?

✅ Consider a pipeline where you initially performed a full data load, but now you want to load data incrementally.

✅ How would you implement this change using Databricks?

✅ Scenario-based questions related to Spark.

✅ Scenario-based questions related to handling schema evolution.

✅ Write a SQL query to retrieve employees whose salary is greater than the average salary of their department.

Emp Table: empid, emp_name, salary, deptid

Dept table: deptid, dept_name

Expected Output: empid, emp_name, salary, dept_name

✅ Write a SQL query to find IDs that appear 3 consecutive times in a table.

Log Table:

id num

1 1

2 1

3 1

4 2

5 1

6 2

7 2

Expected Output: 1

✅ Explain how LEAD and LAG functions work in SQL with an example.

Round 3: HR

✅ Discussion around my experience and projects, some resume-based questions

✅ What are you expecting in your next job role?

✅ How soon can you join the company and what is my preferred location.

Complete Data Engineering With Azure - Basic To Advance

Admission Open

By Grow Data Skills

Enroll Now

Complete Data Engineering With AWS - Basic To Advance

Admission Open

By Grow Data Skills

Enroll Now